Start with data containing PM2.5, anomalous AOT, smoke day, height in meters above ground level, and pressure in hPa. Data is the result of matching 72-hour trajectories initialized from HYSPLIT points in California 2020 following Brey et al. with EPA station-day-PM2.5 data and then merging with 10 km grid of AOT, mean AOT, and smoke day. Anomalous AOT is AOT - mean AOT.
Aggregation method and cutoff determine how trajectory points were matched to EPA stations.
# Choose aggregation method: nn, idw, or k (idw of knn)
agg <- "idw"
# Choose cutoff: 10, 30, 50, 80, or 100 km
cut <- 30
# Configure data frame based on the above
dat_merged1 <- dat_merged %>% filter(agg_method == agg, cutoff == cut)
We model PM2.5
df <- dat_merged1
mdl0 <- lm(pm25 ~ aot_anom + aot_anom:smoke_day,
data = df)
mdl1 <- lm(pm25 ~ aot_anom + aot_anom:smoke_day + aot_anom:smoke_day:height,
data = df)
mdl2 <- lm(pm25 ~ aot_anom + aot_anom:smoke_day + aot_anom:smoke_day:pressure,
data = df)
Model with just anomalous AOT and smoke day does better than models with height or pressure. Based on the last PM2.5 modeling notebook I saw, I think these R2 values are higher? All terms are significant, but effect sizes are small for terms with height or pressure (or maybe they’re not small and it’s just an artifact of units measuring height or pressure). Effect direction for height follows what I would expect, but not pressure (hope I’m recalling science correctly here).
| Dependent variable: | |||
| pm25 | |||
| (1) | (2) | (3) | |
| aot_anom | 4.765*** | 5.167*** | 5.430*** |
| (1.040) | (1.512) | (1.515) | |
| aot_anom:smoke_day | 46.930*** | 52.769*** | 63.912*** |
| (1.054) | (1.588) | (2.427) | |
| aot_anom:smoke_day:height | -0.004*** | ||
| (0.0003) | |||
| aot_anom:smoke_day:pressure | -0.024*** | ||
| (0.002) | |||
| Constant | 7.579*** | 8.197*** | 8.048*** |
| (0.071) | (0.120) | (0.121) | |
| Observations | 92,536 | 52,928 | 52,928 |
| R2 | 0.415 | 0.400 | 0.398 |
| Adjusted R2 | 0.415 | 0.400 | 0.397 |
| Residual Std. Error | 20.818 (df = 92533) | 25.792 (df = 52924) | 25.839 (df = 52924) |
| F Statistic | 32,802.810*** (df = 2; 92533) | 11,745.030*** (df = 3; 52924) | 11,639.840*** (df = 3; 52924) |
| Note: | p<0.1; p<0.05; p<0.01 | ||
However, I am not sure if this is the right regression or if we should still include fixed effects of some sort. If I’ve understood correctly, anomalizing AOT accomplishes what FEs had originally been intended to do, which was capture background AOT. But I guess we may also want to capture background PM2.5? So we’ll try that later in the document.
Before that, let’s get predicted PM2.5 and compare the time series from August to Mid-October in the Bay Area.
df$pred_aotanom_smokeday <- predict(mdl0)
df[!is.na(df$height), "pred_aotanom_smokeday_height"] <- predict(mdl1)
df[!is.na(df$pressure), "pred_aotanom_smokeday_pressure"] <- predict(mdl2)
t1 <- as.Date("2020-08-01")
t2 <- as.Date("2020-10-15")
event <- df %>%
filter(t1 <= date, date <= t2, cbsa_name == "San Francisco-Oakland-Hayward, CA") %>%
group_by(date) %>%
summarize(across(c(pm25, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
It look like all the models are capturing the big spike and then overestimating the medium spikes sometimes. Not much difference between the model predictions, except we’re missing predictions where height and pressure are missing due to no match between EPA station and trajectory points.
We regress PM2.5 on fixed effects for EPA station ID and month of the year (we only have 2020 here) to get demeaned PM2.5 values.
df <- dat_merged1
# Get residuals and fitted values
mdlfe <- feols(pm25 ~ 1 | id_epa + month, data = df)
df$pm25_dmn <- mdlfe$resid
df$fv <- mdlfe$fitted.values
Same models as before but now on demeaned PM2.5.
mdl0 <- lm(pm25_dmn ~ aot_anom + aot_anom:smoke_day,
data = df)
mdl1 <- lm(pm25_dmn ~ aot_anom + aot_anom:smoke_day + aot_anom:smoke_day:height,
data = df)
mdl2 <- lm(pm25_dmn ~ aot_anom + aot_anom:smoke_day + aot_anom:smoke_day:pressure,
data = df)
These R2 values now look more like the ones from the last notebook I saw. R2 is slightly larger for models with height or pressure, and model with height does best. All terms are significant.
| Dependent variable: | |||
| pm25_dmn | |||
| (1) | (2) | (3) | |
| aot_anom | -7.600*** | -6.565*** | -6.346*** |
| (1.069) | (1.506) | (1.513) | |
| aot_anom:smoke_day | 44.175*** | 52.799*** | 47.864*** |
| (1.084) | (1.582) | (2.424) | |
| aot_anom:smoke_day:height | -0.006*** | ||
| (0.0003) | |||
| aot_anom:smoke_day:pressure | -0.006*** | ||
| (0.002) | |||
| Constant | -3.423*** | -4.226*** | -4.350*** |
| (0.073) | (0.120) | (0.121) | |
| Observations | 92,536 | 52,928 | 52,928 |
| R2 | 0.252 | 0.264 | 0.257 |
| Adjusted R2 | 0.252 | 0.264 | 0.257 |
| Residual Std. Error | 21.413 (df = 92533) | 25.696 (df = 52924) | 25.815 (df = 52924) |
| F Statistic | 15,589.270*** (df = 2; 92533) | 6,327.997*** (df = 3; 52924) | 6,108.691*** (df = 3; 52924) |
| Note: | p<0.1; p<0.05; p<0.01 | ||
We add back in the fitted values from the FE regression, get predicted PM2.5 (fitted value from FE reg + residual predicted by our models), and compare the time series from August to Mid-October in the Bay Area.
df$pred_dmn_aotanom_smokeday <- predict(mdl0)
df[!is.na(df$height), "pred_dmn_aotanom_smokeday_height"] <- predict(mdl1)
df[!is.na(df$pressure), "pred_dmn_aotanom_smokeday_pressure"] <- predict(mdl2)
df <- df %>%
mutate(across(starts_with("pred_"), as.numeric),
pred_dmn_aotanom_smokeday = pred_dmn_aotanom_smokeday + fv,
pred_dmn_aotanom_smokeday_height = pred_dmn_aotanom_smokeday_height + fv,
pred_dmn_aotanom_smokeday_pressure = pred_dmn_aotanom_smokeday_pressure + fv)
t1 <- as.Date("2020-08-01")
t2 <- as.Date("2020-10-15")
event <- df %>%
filter(t1 <= date, date <= t2, cbsa_name == "San Francisco-Oakland-Hayward, CA") %>%
group_by(date) %>%
summarize(across(c(pm25, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
Models capture the big spike, though now slightly underestimating. But now greatly overestimating the big dip in later September. Overestimating the rise in early September. Overestimating the lower PM2.5 values generally. Still have weird inversions in later August, though not as severe as in the previous graph.
It looks like models with just anomalous AOT and smoke day are able to capture the big spike, so I wonder if height is really needed, especially considering the missingness due to matching, or if I’ve done something wrong somewhere…
We change to using the 100 km cutoff so that we don’t miss observations across models.
# Re-configure merged data to use max cutoff (100 km)
agg <- "idw"
cut <- 100
dat_merged1 <- dat_merged %>% filter(agg_method == agg, cutoff == cut)
We change our baseline model to
and compare with
df <- dat_merged1
mdl0 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day | id_epa + month,
data = df)
mdl1 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:height | id_epa + month,
data = df)
mdl2 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:pressure | id_epa + month,
data = df)
Within R2 is around 0.34 and is highest for model with height, though ultimately not differing much across models. Smoke day term and pressure interaction term are not significant. Interaction term with height is significant, and coefficient goes in expected direction but is small.
etable(mdl0, mdl1, mdl2, drop.section = "fixef")
Let’s get predicted PM2.5 and plot some comparisons.
df$pred_aotanom_smokeday <- predict(mdl0)
df$pred_aotanom_smokeday_height <- predict(mdl1)
df$pred_aotanom_smokeday_pressure <- predict(mdl2)
We filter to smoke days and compare the predictions of each model to observed PM2.5 values. They look very much the same. Seems there’s still quite some unexplained variation in PM2.5.
dfsd <- df %>%
filter(smoke_day == 1) %>%
pivot_longer(starts_with("pred_"), names_to = "mdl")
ggplot(data = dfsd, mapping = aes(x = value, y = pm25)) +
geom_point(aes(color = mdl), alpha = 0.2) +
facet_wrap(vars(mdl))
Here’s a correlation plot to confirm that the predictions are very close across the models.
dfsd <- df %>%
filter(smoke_day == 1) %>%
select(pm25, starts_with("pred_"))
corr_dfsd <- cor(dfsd, use = "complete.obs")
ggcorrplot(corr_dfsd, lab = TRUE)
Let’s try plotting time series where baseline model underestimates.
Looks like Mono county PM2.5 gets underestimated the most.
df$resid0 <- mdl0$residuals
df %>% slice_max(resid0, n = 50) %>% count(county) %>% arrange(desc(n))
The Slink Fire in Mono county was reported on August 29th and last updated October 22nd. So we plot the time series from mid-August through the end of October.
t1 <- as.Date("2020-08-15")
t2 <- as.Date("2020-10-31")
event <- df %>%
filter(t1 <= date, date <= t2, county == "Mono") %>%
group_by(date) %>%
summarize(across(c(pm25, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
In addition to the baseline model underestimating PM2.5, our models with height or pressure don’t quite capture most spikes, nor do they differ much from the baseline model.
Let’s also take a look at our runner-up, Lane county.
The Holiday Farm Fire was reported on September 7th and contained on October 29th. So we plot the time series from late August to mid-November.
t1 <- as.Date("2020-08-22")
t2 <- as.Date("2020-11-15")
event <- df %>%
filter(t1 <= date, date <= t2, county == "Lane") %>%
group_by(date) %>%
summarize(across(c(pm25, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
Our models underestimate PM2.5, but it’s not as bad as above. Also overestimates PM2.5 levels near zero. Model with height captures spike just before hitting the top of the big spike slightly better than the others.
Let’s take a look at a big fire and see how our models perform.
The August Complex started August 17th and was contained on November 15th. It burned in Glenn, Lake, Mendocino, Tehama, Trinity, and Shasta counties. So we plot the time series from early August to the end of November.
t1 <- as.Date("2020-08-01")
t2 <- as.Date("2020-11-30")
event <- df %>%
filter(t1 <= date, date <= t2,
county %in% c("Glenn", "Lake", "Mendocino", "Tehama", "Trinity", "Shasta")) %>%
group_by(date) %>%
summarize(across(c(pm25, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
Our models capture the big spike but noticeably overestimate the spike in late August. They also underestimate early August and overestimate a bit in early September. The model predictions overlap a lot, but at a few spikes the model with height seems to estimate a bit higher than the others.
Perhaps there are better ways of analyzing this, but based on R2 and plots, it seems to me that height isn’t making a big difference from our baseline model.
We change to 30 km cutoff and fit our baseline model and the model with height using the same sample.
# Re-configure merged data to use 30 km cutoff
agg <- "idw"
cut <- 30
dat_merged1 <- dat_merged %>% filter(agg_method == agg, cutoff == cut)
# Drop obs where missing height
dat_merged1 <- dat_merged1 %>% drop_na(height)
df <- dat_merged1
mdl0 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day | id_epa + month,
data = df)
mdl1 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:height | id_epa + month,
data = df)
mdl2 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:pressure | id_epa + month,
data = df)
Compared to using the 100 km cutoff, within R2 is a bit lower here. The model with height does best, though R2 differences are small. Coefficient on term with height seems similar to above, i.e. right direction but small. Coefficient on term with height is smaller in magnitude now. Also just noticed that coefficient on just smoke day term is negative, not sure if that makes sense.
etable(mdl0, mdl1, mdl2, drop.section = "fixef")
Let’s compare the models’ predicted PM2.5.
df$pred_aotanom_smokeday <- predict(mdl0)
df$pred_aotanom_smokeday_height <- predict(mdl1)
df$pred_aotanom_smokeday_pressure <- predict(mdl2)
We filter to smoke days and compare the predictions using their correlations.
dfsd <- df %>%
filter(smoke_day == 1) %>%
select(pm25, starts_with("pred_"))
corr_dfsd <- cor(dfsd, use = "complete.obs")
ggcorrplot(corr_dfsd, lab = TRUE)
Doesn’t seem like 30 km cutoff helped height have larger effect.
I wonder how much the choice of initial height parameters influences our results.
Adding here time series plot of August Complex with only observed PM2.5 and baseline model to help see lines more clearly.
t1 <- as.Date("2020-08-01")
t2 <- as.Date("2020-11-30")
event <- df %>%
select(-pred_aotanom_smokeday_height, -pred_aotanom_smokeday_pressure) %>%
filter(t1 <= date, date <= t2,
county %in% c("Glenn", "Lake", "Mendocino", "Tehama", "Trinity", "Shasta")) %>%
group_by(date) %>%
summarize(across(c(pm25, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
We limit to EPA stations in California since trajectories were initiated in California and may not represent plumes in other states.
# Drop obs not in California
loc_epa <- dat_merged[c("lon_epa", "lat_epa")] %>% distinct()
us_states <- states() %>% as("Spatial")
us_states <- SpatialPolygonsDataFrame(SpatialPolygons(us_states@polygons),
us_states@data %>% select(NAME))
o <- over(SpatialPoints(loc_epa), us_states)
dat_merged0 <- dat_merged %>%
left_join(bind_cols(loc_epa, o)) %>%
rename(state = NAME) %>%
filter(state == "California")
agg <- "idw"
cut <- 30
dat_merged1 <- dat_merged0 %>% filter(agg_method == agg, cutoff == cut)
dat_merged1 <- dat_merged1 %>% drop_na(height)
df <- dat_merged1
mdl0 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day | id_epa + month,
data = df)
mdl1 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:height | id_epa + month,
data =df)
mdl2 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:pressure | id_epa + month,
data = df)
Lost a little over half our obs to filtering for California. Smoke day alone coefficient is positive now. Compared to above, coefficient on anomalous AOT is bigger and coefficient on anomalous AOT and smoke day interaction term is smaller for baseline and height models and larger for pressure model. Interaction term with height is no longer statistically significant. Within R2 is lower now. Model with height still performs better than baseline model, but R2 differences are small, and now model with pressure performs best.
etable(mdl0, mdl1, mdl2, drop.section = "fixef")
We get the models’ predicted PM2.5. We stick to total PM2.5 for now. Plots of specifically smoke PM2.5 will come in next section.
df$pred_aotanom_smokeday <- predict(mdl0)
df$pred_aotanom_smokeday_height <- predict(mdl1)
df$pred_aotanom_smokeday_pressure <- predict(mdl2)
Predictions still very similar, though model with pressure more different than model with height when compared against baseline.
dfsd <- df %>%
filter(smoke_day == 1) %>%
select(pm25, starts_with("pred_"))
corr_dfsd <- cor(dfsd, use = "complete.obs")
ggcorrplot(corr_dfsd, lab = TRUE)
Plotting time series for Bay Area anew.
t1 <- as.Date("2020-08-01")
t2 <- as.Date("2020-10-15")
event <- df %>%
filter(t1 <= date, date <= t2, cbsa_name == "San Francisco-Oakland-Hayward, CA") %>%
group_by(date) %>%
summarize(across(c(pm25, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
Models underestimate the big peak, but model with height and baseline get a bit more of the top spike than does model with pressure. Medium peaks somewhat captured. Lower levels of PM2.5 get overestimated a bit.
Mono county gets underestimated the most again.
df$resid0 <- mdl0$residuals
df %>% slice_max(resid0, n = 50) %>% count(county) %>% arrange(desc(n))
Slink Fire again.
t1 <- as.Date("2020-08-15")
t2 <- as.Date("2020-10-31")
event <- df %>%
filter(t1 <= date, date <= t2, county == "Mono") %>%
group_by(date) %>%
summarize(across(c(pm25, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
Model with pressure captures peaks a bit better. All models overestimate low levels of PM2.5.
August Complex again.
t1 <- as.Date("2020-08-01")
t2 <- as.Date("2020-11-30")
event <- df %>%
filter(t1 <= date, date <= t2,
county %in% c("Glenn", "Lake", "Mendocino", "Tehama", "Trinity", "Shasta")) %>%
group_by(date) %>%
summarize(across(c(pm25, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
Model with pressure captures big peak best. All models overestimate late August peak and underestimate late September peaks.
Now we use IDW average of daily within-trajectory minimum heights within buffer zone. In other words, for each EPA station-day-PM2.5 combination, first found trajectory points on the same day within a set radius, next grouped by trajectory ID and found trajectory point with minimum height (maximum pressure) for each trajectory, and then IDW average the height (pressure) values across trajectories.
# New option "extr" for using within-trajectory extrema
agg <- "extr"
cut <- 30
dat_merged1 <- dat_merged0 %>% filter(agg_method == agg, cutoff == cut)
dat_merged1 <- dat_merged1 %>% drop_na(height)
df <- dat_merged1
mdl0 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day | id_epa + month,
data = df)
mdl1 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:height | id_epa + month,
data = df)
mdl2 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:pressure | id_epa + month,
data = df)
Results are pretty similar to above. So the change in matching method doesn’t seem to make a big difference.
etable(mdl0, mdl1, mdl2, drop.section = "fixef")
Now we predict and plot smoke PM2.5. Not sure if FEs are also supposed to go in here so just not including for now.
smoke_day <- df$smoke_day
aot_anom <- df$aot_anom
height <- df$height
pressure <- df$pressure
coefs <- coefficients(mdl0)
df$pred_aotanom_smokeday <- coefs["smoke_day"]*smoke_day +
coefs["aot_anom:smoke_day"]*aot_anom*smoke_day
coefs <- coefficients(mdl1)
df$pred_aotanom_smokeday_height <- coefs["smoke_day"]*smoke_day +
coefs["aot_anom:smoke_day"]*aot_anom*smoke_day +
coefs["aot_anom:smoke_day:height"]*aot_anom*smoke_day*height
coefs <- coefficients(mdl2)
df$pred_aotanom_smokeday_pressure <- coefs["smoke_day"]*smoke_day +
coefs["aot_anom:smoke_day"]*aot_anom*smoke_day +
coefs["aot_anom:smoke_day:pressure"]*aot_anom*smoke_day*pressure
We filter to smoke days and compare the predictions of each model to total PM2.5 values. Pressure model seems to do slightly better, but predictions don’t get past 150 micrograms, so seems we probably won’t be able to capture big spikes.
dfsd <- df %>%
filter(smoke_day == 1) %>%
pivot_longer(starts_with("pred_"), names_to = "mdl")
ggplot(data = dfsd, mapping = aes(x = value, y = pm25)) +
geom_point(aes(color = mdl), alpha = 0.2) +
facet_wrap(vars(mdl))
Bay Area again. Since we’re predicting smoke PM2.5 and not total PM2.5, predictions are lower now. Pressure model does worse than baseline and height models in late September/early October, but otherwise the models are pretty similar.
t1 <- as.Date("2020-08-01")
t2 <- as.Date("2020-10-15")
event <- df %>%
filter(t1 <= date, date <= t2, cbsa_name == "San Francisco-Oakland-Hayward, CA") %>%
group_by(date) %>%
summarize(across(c(pm25, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
Pressure model predictions differ from baseline model predictions by at most about 61 micrograms, though this is uncommon.
diff_height <- df$pred_aotanom_smokeday_height - df$pred_aotanom_smokeday
summary(diff_height)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-16.45710 0.00000 0.00000 -0.01706 0.00000 10.47510
quantile(diff_height, c(0.01, 0.05, 0.95, 0.99))
1% 5% 95% 99%
-2.9539452 -0.7675867 0.6609877 2.7859131
diff_pressure <- df$pred_aotanom_smokeday_pressure - df$pred_aotanom_smokeday
summary(diff_pressure)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-52.71067 0.00000 0.00000 0.05918 0.00000 61.10585
quantile(diff_pressure, c(0.01, 0.05, 0.95, 0.99))
1% 5% 95% 99%
-9.887987 -2.609854 2.780110 11.325619
From the above, height doesn’t generally seem to help much.
Also took a look at Brey et al. again and a few questions popped up since we follow them up to initializing trajectories but not after that:
EPA sometimes records multiple PM2.5 values for the same station-day, so let’s try averaging each station-day’s PM2.5 values and see if that changes anything.
agg <- "idw"
cut <- 30
dat_merged1 <- dat_merged0 %>%
filter(agg_method == agg, cutoff == cut) %>%
drop_na(height)
dat_merged1 <- dat_merged1 %>%
group_by(across(c(-pm25))) %>%
summarize(pm25 = mean(pm25, na.rm = TRUE)) %>%
ungroup()
# dfpm <- dat_merged1 %>%
# group_by(date, id_epa) %>%
# summarize(pm25 = mean(pm25)) %>%
# ungroup()
# dat_merged1 <- dat_merged1 %>%
# select(-pm25) %>%
# distinct() %>%
# left_join(dfpm)
df <- dat_merged1
mdl0 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day | id_epa + month,
data = df)
mdl1 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:height | id_epa + month,
data = df)
mdl2 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:pressure | id_epa + month,
data = df)
Regression results look quite similar to above.
etable(mdl0, mdl1, mdl2, drop.section = "fixef")
df$pred_aotanom_smokeday <- predict(mdl0)
df$pred_aotanom_smokeday_height <- predict(mdl1)
df$pred_aotanom_smokeday_pressure <- predict(mdl2)
Bay area time series plot for comparison. Looks similar to above.
t1 <- as.Date("2020-08-01")
t2 <- as.Date("2020-10-15")
event <- df %>%
filter(t1 <= date, date <= t2, cbsa_name == "San Francisco-Oakland-Hayward, CA") %>%
group_by(date) %>%
summarize(across(c(pm25, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
Slink Fire looks similar to above.
t1 <- as.Date("2020-08-15")
t2 <- as.Date("2020-10-31")
event <- df %>%
filter(t1 <= date, date <= t2, county == "Mono") %>%
group_by(date) %>%
summarize(across(c(pm25, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
# dat_merged0 <- dat_merged0 %>%
# group_by(across(c(-pm25))) %>%
# summarize(pm25 = mean(pm25, na.rm = TRUE)) %>%
# ungroup()
We throw away trajectories that don’t overlap plumes in their first 48 hours, following Brey et al. This acts as a check on the trajectories, especially given uncertainty about injection heights, to make sure they’re really modeling the transport of smoke particles.
# Read in merged data based on trajectories that overlap plume in first 48 hours
# This is average station-day PM2.5 already
dat_merged48 <- readRDS(paste0(path_dropbox, "hms_hysplit/trajectories/trajectories_EPA_smoke_AOT_CA_2020_48hour.rds")) %>%
mutate(date = as.Date(date),
year = year(date),
month = month(date),
day = day(date))
# Drop obs not in California
loc_epa <- dat_merged48[c("lon_epa", "lat_epa")] %>% distinct()
o <- over(SpatialPoints(loc_epa), us_states)
dat_merged00 <- dat_merged48 %>%
left_join(bind_cols(loc_epa, o)) %>%
rename(state = NAME) %>%
filter(state == "California")
Joining, by = c("lon_epa", "lat_epa")
agg <- "idw"
cut <- 30
dat_merged1 <- dat_merged00 %>%
filter(agg_method == agg, cutoff == cut) %>%
drop_na(height)
How much data did we throw out? 11,936 trajectories, which is 18.5% of all the 72-hour trajectories that we had generated.
# Compare linked and overlap48 number of traj IDs
dat_traj1 <- readRDS(paste0(path_dropbox, "hms_hysplit/trajectories/trajectories_CA_2020_linked.rds"))
dat_traj2 <- readRDS(paste0(path_dropbox, "hms_hysplit/trajectories/trajectories_CA_2020_overlap48.rds"))
ntraj1 <- length(unique(dat_traj1$id_traj))
ntraj2 <- length(unique(dat_traj2$id_traj))
ntraj1 - ntraj2
[1] 11936
(ntraj1 - ntraj2)/ntraj1
[1] 0.1849768
With fewer trajectories, we also get fewer matches to EPA station-days. For the same method of IDW averaging height values within 30 km each station-day, we go from having ~21.5K obs to now ~10K obs.
Does throwing out these trajectories change our results? The magnitudes of our coefficients change a bit from above (bigger on anomalous AOT, smaller on smoke_day, smaller on AOT_anom*smoke_day), but overall I would say the results still look pretty similar to what we’ve been seeing. In particular, within R2 is around 0.23-0.25, highest for pressure model, and interaction terms with height or pressure both are not significant.
df <- dat_merged1
mdl0 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day | id_epa + month,
data = df)
mdl1 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:height | id_epa + month,
data = df)
mdl2 <- feols(pm25 ~ aot_anom + smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:pressure | id_epa + month,
data = df)
etable(mdl0, mdl1, mdl2, drop.section = "fixef")
df$pred_aotanom_smokeday <- predict(mdl0)
df$pred_aotanom_smokeday_height <- predict(mdl1)
df$pred_aotanom_smokeday_pressure <- predict(mdl2)
Model predictions are similar to each other.
dfsd <- df %>%
filter(smoke_day == 1) %>%
select(pm25, starts_with("pred_"))
corr_dfsd <- cor(dfsd, use = "complete.obs")
ggcorrplot(corr_dfsd, lab = TRUE)
dfsd <- df %>%
filter(smoke_day == 1) %>%
pivot_longer(starts_with("pred_"), names_to = "mdl")
ggplot(data = dfsd, mapping = aes(x = value, y = pm25)) +
geom_point(aes(color = mdl), alpha = 0.2) +
facet_wrap(vars(mdl))
Time series plots look similar to above. Bay Area:
t1 <- as.Date("2020-08-01")
t2 <- as.Date("2020-10-15")
event <- df %>%
filter(t1 <= date, date <= t2, cbsa_name == "San Francisco-Oakland-Hayward, CA") %>%
group_by(date) %>%
summarize(across(c(pm25, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
Slink Fire:
t1 <- as.Date("2020-08-15")
t2 <- as.Date("2020-10-31")
event <- df %>%
filter(t1 <= date, date <= t2, county == "Mono") %>%
group_by(date) %>%
summarize(across(c(pm25, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
August Complex:
t1 <- as.Date("2020-08-01")
t2 <- as.Date("2020-11-30")
event <- df %>%
filter(t1 <= date, date <= t2,
county %in% c("Glenn", "Lake", "Mendocino", "Tehama", "Trinity", "Shasta")) %>%
group_by(date) %>%
summarize(across(c(pm25, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
In short, we threw out half our sample due to throwing out trajectories that don’t overlap plume in the first 48 hours, but update 4 results did not change much from update 3 results.
We want to know: does height/pressure from HYSPLIT trajectories improve daily smoke PM2.5 prediction?
We put some data together:
Sample size ends up around 10K station-days.
We fit some models:
which we compare with
We get some regression results:
We compare model predictions:
Have we exhaustively convinced ourselves that height doesn’t help? We might consider some further options:
Some questions that have popped up:
Now that we have background PM2.5, we can use measures of smoke PM2.5 on LHS instead of total PM2.5. Smoke PM2.5 = total PM2.5 - background PM2.5.
# Read in data with background PM2.5 and get anomalous PM2.5
dat_covariates <- readRDS("/Users/jessssli/BurkeLab Dropbox/Data/PM25/epa_station_covariates.rds") %>%
mutate(pm25_anom = pm25 - pm25_med_3yr)
dat_merged_smoke <- dat_merged00 %>%
left_join(dat_covariates %>% select(epa_id, date, pm25, pm25_med_3yr, pm25_anom),
by = c("id_epa" = "epa_id", "date", "pm25"))
agg <- "idw"
cut <- 30
dat_merged1 <- dat_merged_smoke %>%
filter(agg_method == agg, cutoff == cut) %>%
drop_na(height, pm25_anom)
Distribution of anomalous PM2.5 values.
hist(dat_merged1$pm25_anom)
Now we model as
with fixed effects for station ID and month (only have 2020 so no year FE).
df <- dat_merged1
mdl0 <- feols(pm25_anom ~ smoke_day + aot_anom:smoke_day | id_epa + month,
data = df)
mdl1 <- feols(pm25_anom ~ smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:height | id_epa + month,
data = df)
mdl2 <- feols(pm25_anom ~ smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:pressure | id_epa + month,
data = df)
Regression results look pretty similar to what we’ve been seeing above. Height and pressure terms not significant, both negative, and within R2 around 0.23-0.24, with pressure model performing best by a bit.
etable(mdl0, mdl1, mdl2, drop.section = "fixef")
smoke_day <- df$smoke_day
aot_anom <- df$aot_anom
height <- df$height
pressure <- df$pressure
coefs <- coefficients(mdl0)
df$pred_baseline <- coefs["smoke_day"]*smoke_day +
coefs["smoke_day:aot_anom"]*aot_anom*smoke_day
coefs <- coefficients(mdl1)
df$pred_height <- coefs["smoke_day"]*smoke_day +
coefs["smoke_day:aot_anom"]*aot_anom*smoke_day +
coefs["smoke_day:aot_anom:height"]*aot_anom*smoke_day*height
coefs <- coefficients(mdl2)
df$pred_pressure <- coefs["smoke_day"]*smoke_day +
coefs["smoke_day:aot_anom"]*aot_anom*smoke_day +
coefs["smoke_day:aot_anom:pressure"]*aot_anom*smoke_day*pressure
Model predictions are similar to each other.
dfsd <- df %>%
filter(smoke_day == 1) %>%
select(pm25_anom, starts_with("pred_"))
corr_dfsd <- cor(dfsd, use = "complete.obs")
ggcorrplot(corr_dfsd, lab = TRUE)
dfsd <- df %>%
filter(smoke_day == 1) %>%
pivot_longer(starts_with("pred_"), names_to = "mdl")
ggplot(data = dfsd, mapping = aes(x = value, y = pm25_anom)) +
geom_point(aes(color = mdl), alpha = 0.2) +
facet_wrap(vars(mdl))
Bay Area times series plot looks similar to above.
t1 <- as.Date("2020-08-01")
t2 <- as.Date("2020-10-15")
event <- df %>%
filter(t1 <= date, date <= t2, cbsa_name == "San Francisco-Oakland-Hayward, CA") %>%
group_by(date) %>%
summarize(across(c(pm25_anom, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25_anom, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
Slink Fire time series plot looks like the predictions just shifted down by 50-100 micrograms.
t1 <- as.Date("2020-08-15")
t2 <- as.Date("2020-10-31")
event <- df %>%
filter(t1 <= date, date <= t2, county == "Mono") %>%
group_by(date) %>%
summarize(across(c(pm25_anom, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25_anom, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
August Complex time series plot also looks like predictions shifted down, except early August shifted up.
t1 <- as.Date("2020-08-01")
t2 <- as.Date("2020-11-30")
event <- df %>%
filter(t1 <= date, date <= t2,
county %in% c("Glenn", "Lake", "Mendocino", "Tehama", "Trinity", "Shasta")) %>%
group_by(date) %>%
summarize(across(c(pm25_anom, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25_anom, starts_with("pred_")), names_to = "mdl")
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = mdl)) +
theme_classic()
So based on smoke PM2.5 as anomalous PM2.5, results still seem mostly like what we’ve been seeing, main difference being that predictions shifted to 0 some.
Now some exploration of height values by injection height.
What is the distribution of height across trajectory points by injection height?
dat_overlap48 <- dat_traj2
id_overlap48 <- unique(dat_overlap48$id_traj)
dat_linked <- dat_traj1
dat_traj <- dat_linked %>%
select(-id_hysplit) %>%
distinct() %>%
filter(id_traj %in% id_overlap48) %>%
mutate(height_i = as.factor(height_i))
ggplot(data = dat_traj, mapping = aes(x = height, group = height_i, fill = height_i)) +
geom_density(alpha = 0.5) +
geom_vline(xintercept = c(500, 1500, 2500), linetype = 2) +
theme_classic() +
labs(title = "Trajectory Points", x = "mAGL", y = "Density", fill = "Injection Height")
As we might expect, height distributions depend on injection height and peak at their respective injection heights (black dotted lines). Also, there are lots of height values at 0. Perhaps particles tend to stay on the ground once they’ve hit the ground.
How does height vary over the course of a 72-hour trajectory? How does this depend on injection height?
The plot below shows height at the hour along the trajectory for each trajectory (gray) in a random sample and the average height at the hour along across all trajectories grouped by injection height.
dat_traj3 <- dat_traj %>%
group_by(height_i, hour_along) %>%
summarize(height = mean(height)) %>%
ungroup()
`summarise()` has grouped output by 'height_i'. You can override using the `.groups` argument.
set.seed(123)
rand <- dat_traj %>%
pull(id_traj) %>%
unique() %>%
sample(200)
ggplot(data = dat_traj %>% filter(id_traj %in% rand), mapping = aes(x = hour_along, y = height, group = id_traj)) +
geom_line(alpha = 0.2, color = "gray") +
geom_line(data = dat_traj3, mapping = aes(x = hour_along, y = height, group = height_i, color = height_i)) +
facet_wrap(vars(height_i)) +
geom_hline(yintercept = c(500, 1500, 2500), linetype = 2) +
theme_classic()
Trajectories initiated at 500 mAGL seem most likely to hit the ground. Average trajectories seem to be rising in hour along.
Are our height or pressure models sensitive to injection height? Are smoke PM2.5 predictions sensitive to injection height?
We drop out one injection height at a time and compare. So before we used trajectories initiated at 500, 1500, and 2500 mAGL. Now we also run models using trajectories initiated at 500 or 1500 mAGL, 500 or 2500 mgAGL, and 1500 or 2500 mAGL.
dat_iheight <- readRDS(paste0(path_dropbox, "hms_hysplit/trajectories/trajectories_EPA_smoke_AOT_CA_2020_48hour_iheight.rds")) %>%
mutate(date = as.Date(date),
year = year(date),
month = month(date),
day = day(date))
# Drop obs not in California
loc_epa <- dat_iheight[c("lon_epa", "lat_epa")] %>% distinct()
o <- over(SpatialPoints(loc_epa), us_states)
dat_iheight <- dat_iheight %>%
left_join(bind_cols(loc_epa, o)) %>%
rename(state = NAME) %>%
filter(state == "California")
Joining, by = c("lon_epa", "lat_epa")
dat_iheight_smoke <- dat_iheight %>%
left_join(dat_covariates %>% select(epa_id, date, pm25, pm25_med_3yr, pm25_anom),
by = c("id_epa" = "epa_id", "date", "pm25"))
dat_iheight_smoke <- dat_iheight_smoke %>%
bind_rows(dat_merged_smoke %>% mutate(injection_heights = "500+1500+2500"))
agg <- "idw"
cut <- 30
dat_merged1 <- dat_iheight_smoke %>%
filter(agg_method == agg, cutoff == cut) %>%
drop_na(height, pm25_anom)
# Get to same sample across injection height combinations
id012 <- dat_merged1 %>%
filter(injection_heights == "500+1500+2500") %>%
select(date, id_epa) %>%
distinct()
id01 <- dat_merged1 %>%
filter(injection_heights == "500+1500") %>%
select(date, id_epa) %>%
distinct()
id02 <- dat_merged1 %>%
filter(injection_heights == "500+2500") %>%
select(date, id_epa) %>%
distinct()
id12 <- dat_merged1 %>%
filter(injection_heights == "1500+2500") %>%
select(date, id_epa) %>%
distinct()
id_common <- reduce(list(id012, id01, id02, id12), inner_join)
Joining, by = c("date", "id_epa")
Joining, by = c("date", "id_epa")
Joining, by = c("date", "id_epa")
dat_merged1 <- dat_merged1 %>%
filter(paste(date, id_epa) %in% paste(id_common$date, id_common$id_epa))
df <- dat_merged1
mdl0012 <- feols(pm25_anom ~ smoke_day + aot_anom:smoke_day | id_epa + month,
data = df %>% filter(injection_heights == "500+1500+2500"))
mdl1012 <- feols(pm25_anom ~ smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:height | id_epa + month,
data = df %>% filter(injection_heights == "500+1500+2500"))
mdl2012 <- feols(pm25_anom ~ smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:pressure | id_epa + month,
data = df %>% filter(injection_heights == "500+1500+2500"))
mdl001 <- feols(pm25_anom ~ smoke_day + aot_anom:smoke_day | id_epa + month,
data = df %>% filter(injection_heights == "500+1500"))
mdl101 <- feols(pm25_anom ~ smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:height | id_epa + month,
data = df %>% filter(injection_heights == "500+1500"))
mdl201 <- feols(pm25_anom ~ smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:pressure | id_epa + month,
data = df %>% filter(injection_heights == "500+1500"))
mdl002 <- feols(pm25_anom ~ smoke_day + aot_anom:smoke_day | id_epa + month,
data = df %>% filter(injection_heights == "500+2500"))
mdl102 <- feols(pm25_anom ~ smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:height | id_epa + month,
data = df %>% filter(injection_heights == "500+2500"))
mdl202 <- feols(pm25_anom ~ smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:pressure | id_epa + month,
data = df %>% filter(injection_heights == "500+2500"))
mdl012 <- feols(pm25_anom ~ smoke_day + aot_anom:smoke_day | id_epa + month,
data = df %>% filter(injection_heights == "1500+2500"))
mdl112 <- feols(pm25_anom ~ smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:height | id_epa + month,
data = df %>% filter(injection_heights == "1500+2500"))
mdl212 <- feols(pm25_anom ~ smoke_day + aot_anom:smoke_day + aot_anom:smoke_day:pressure | id_epa + month,
data = df %>% filter(injection_heights == "1500+2500"))
(I wonder what’s a better way to output feols results than etable…)
mdl0 = baseline model
mdl1 = height model
mdl2 = pressure model
mdl[0-2]012 = 500+1500+2500 mAGL injection heights (all as before)
mdl[0-2]01 = 500+1500 mAGL injection heights
mdl[0-2]02 = 500+2500
mdl[0-2]12 = 1500+2500
Sample is ~3K obs smaller compared to above. Within R2 ranges 0.21-0.24, with pressure model performing best by a bit in each case. Smoke day term not as large as before and no longer significant. Height and pressure term coefficients both usually negative and usually not significant. Height term coefficient is positive in model using 500+1500 mAGL, which is not direction I would expect. Height term is significant in model using 1500+2500 mAGL.
etable(mdl0012, mdl1012, mdl2012,
mdl001, mdl101, mdl201,
mdl002, mdl102, mdl202,
mdl012, mdl112, mdl212,
drop.section = "fixef")
smoke_day <- df$smoke_day
aot_anom <- df$aot_anom
height <- df$height
pressure <- df$pressure
df$pred_baseline <- NA
df$pred_height <- NA
df$pred_pressure <- NA
coefs <- coefficients(mdl0012)
i <- df$injection_heights == "500+1500+2500"
df$pred_baseline[i] <- (coefs["smoke_day"]*smoke_day +
coefs["smoke_day:aot_anom"]*aot_anom*smoke_day)[i]
coefs <- coefficients(mdl1012)
df$pred_height[i] <- (coefs["smoke_day"]*smoke_day +
coefs["smoke_day:aot_anom"]*aot_anom*smoke_day +
coefs["smoke_day:aot_anom:height"]*aot_anom*smoke_day*height)[i]
coefs <- coefficients(mdl2012)
df$pred_pressure[i] <- (coefs["smoke_day"]*smoke_day +
coefs["smoke_day:aot_anom"]*aot_anom*smoke_day +
coefs["smoke_day:aot_anom:pressure"]*aot_anom*smoke_day*pressure)[i]
coefs <- coefficients(mdl001)
i <- df$injection_heights == "500+1500"
df$pred_baseline[i] <- (coefs["smoke_day"]*smoke_day +
coefs["smoke_day:aot_anom"]*aot_anom*smoke_day)[i]
coefs <- coefficients(mdl101)
df$pred_height[i] <- (coefs["smoke_day"]*smoke_day +
coefs["smoke_day:aot_anom"]*aot_anom*smoke_day +
coefs["smoke_day:aot_anom:height"]*aot_anom*smoke_day*height)[i]
coefs <- coefficients(mdl201)
df$pred_pressure[i] <- (coefs["smoke_day"]*smoke_day +
coefs["smoke_day:aot_anom"]*aot_anom*smoke_day +
coefs["smoke_day:aot_anom:pressure"]*aot_anom*smoke_day*pressure)[i]
coefs <- coefficients(mdl002)
i <- df$injection_heights == "500+2500"
df$pred_baseline[i] <- (coefs["smoke_day"]*smoke_day +
coefs["smoke_day:aot_anom"]*aot_anom*smoke_day)[i]
coefs <- coefficients(mdl102)
df$pred_height[i] <- (coefs["smoke_day"]*smoke_day +
coefs["smoke_day:aot_anom"]*aot_anom*smoke_day +
coefs["smoke_day:aot_anom:height"]*aot_anom*smoke_day*height)[i]
coefs <- coefficients(mdl202)
df$pred_pressure[i] <- (coefs["smoke_day"]*smoke_day +
coefs["smoke_day:aot_anom"]*aot_anom*smoke_day +
coefs["smoke_day:aot_anom:pressure"]*aot_anom*smoke_day*pressure)[i]
coefs <- coefficients(mdl012)
i <- df$injection_heights == "1500+2500"
df$pred_baseline[i] <- (coefs["smoke_day"]*smoke_day +
coefs["smoke_day:aot_anom"]*aot_anom*smoke_day)[i]
coefs <- coefficients(mdl112)
df$pred_height[i] <- (coefs["smoke_day"]*smoke_day +
coefs["smoke_day:aot_anom"]*aot_anom*smoke_day +
coefs["smoke_day:aot_anom:height"]*aot_anom*smoke_day*height)[i]
coefs <- coefficients(mdl212)
df$pred_pressure[i] <- (coefs["smoke_day"]*smoke_day +
coefs["smoke_day:aot_anom"]*aot_anom*smoke_day +
coefs["smoke_day:aot_anom:pressure"]*aot_anom*smoke_day*pressure)[i]
Predictions for height and pressure models by injection height compared against anomalous PM2.5.
dfsd <- df %>%
filter(smoke_day == 1) %>%
pivot_longer(starts_with("pred_"), names_to = "mdl")
ggplot(data = dfsd %>% filter(mdl != "pred_baseline"), mapping = aes(x = value, y = pm25_anom)) +
geom_point(aes(color = injection_heights), alpha = 0.2) +
facet_wrap(vars(mdl))
# geom_point(aes(color = interaction(mdl, injection_heights)), alpha = 0.2)# +
# facet_wrap(vars(mdl, injection_heights))
Injection height doesn’t seem to make too much difference in smoke PM2.5 maybe makes a bit more of difference in the pressure model than the height model.
Bay Area:
t1 <- as.Date("2020-08-01")
t2 <- as.Date("2020-10-15")
event <- df %>%
filter(t1 <= date, date <= t2, cbsa_name == "San Francisco-Oakland-Hayward, CA") %>%
group_by(date, injection_heights) %>%
summarize(across(c(pm25_anom, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25_anom, starts_with("pred_")), names_to = "mdl") %>%
filter(mdl %in% c("pred_height", "pred_pressure"))
`summarise()` has grouped output by 'date'. You can override using the `.groups` argument.
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = injection_heights)) +
theme_classic() +
facet_wrap(vars(mdl))
Slink Fire:
t1 <- as.Date("2020-08-15")
t2 <- as.Date("2020-10-31")
event <- df %>%
filter(t1 <= date, date <= t2, county == "Mono") %>%
group_by(date, injection_heights) %>%
summarize(across(c(pm25_anom, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25_anom, starts_with("pred_")), names_to = "mdl") %>%
filter(mdl %in% c("pred_height", "pred_pressure"))
`summarise()` has grouped output by 'date'. You can override using the `.groups` argument.
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = injection_heights)) +
theme_classic() +
facet_wrap(vars(mdl))
August Complex:
t1 <- as.Date("2020-08-01")
t2 <- as.Date("2020-11-30")
event <- df %>%
filter(t1 <= date, date <= t2,
county %in% c("Glenn", "Lake", "Mendocino", "Tehama", "Trinity", "Shasta")) %>%
group_by(date, injection_heights) %>%
summarize(across(c(pm25_anom, starts_with("pred")), mean, na.rm = TRUE)) %>%
pivot_longer(c(pm25_anom, starts_with("pred_")), names_to = "mdl") %>%
filter(mdl %in% c("pred_height", "pred_pressure"))
`summarise()` has grouped output by 'date'. You can override using the `.groups` argument.
ggplot(data = event, mapping = aes(x = date, y = value)) +
geom_line(mapping = aes(color = injection_heights)) +
theme_classic() +
facet_wrap(vars(mdl))
So dropping one injection height out at a time, doesn’t seem like models perform much different.